Discarded Seafloor-objects: A Hidden World Beneath the Waves¶

A Project by Allison Buchanan and Robyn Marowitz¶

Radioactive Dumping

image credit: pxfuel.com

Introduction to the Sea Floor¶

The ocean's depths are a realm of mystery and discovery, holding wonders that have intrigued scientists and explorers for centuries.
Sadly, the marine environment halso as been a dumping ground for discarded objects for decades. An astonishing variety of discarded objects rest quietly beneath the waves. These forgotten artifacts range from ancient shipwrecks to today's waste; from gigantic swaths of fishing nets to modern-day debris. Some of these objects are pretty hazardous. One example is an estimated 55,000 containers of radioactive waste. These containers were dumped overboard at various Pacific Ocean sites from 1946-1970 (epa., 2022gov).

New technology allows us to investigate these objects and determine their outcomes. We can employ machine learning to determine what might happen to items - will they sink, be buried, or drift? And, if they drift, where will they go? These methods help us determine where hotspots of discarded objects are and help us make more informed decisions and advances in our environmental, ecological and navigational movements.

The Process¶

Our chosen study site was an area off the coast of the Carolinas. We collected a wide range of data to see if we could identify the outcome of the located remnants of history on the ocean floor. This data ranged from shipwrecks and obstructions, artificial reefs, and oyster sanctuaries to currents and waves, sediment type, nightlights fishing data, and more. Once we successfully downloaded our data, we needed to clip it to our area of interest, and finally clean and conform it all to one another. Once this is through and data is wrangled, it is ready for a machine learning model. The output of this model allows us to analyze the state of the ocean floor in our location and infer the ultimate fate of discarded objects.

Goals¶

The goal of this project is to be able to build a model that can predict movement with high confidence of the discarded objects on the sea floor. This is important because movement can cause toxic waste to be released, or destroy the things living on the sea floor, etc.

We created a bounding box GeoDataFrame of the study area to clip our imported data with. We used EPSG:4326. That is shown here:

In [2]:
box = {'geometry': [Polygon([(-77.121369, 36.541466),
                             (-70.760165, 36.541466),
                             (-71.511922, 32.087495),
                             (-79.317663, 31.036502)])]}
bbox_gdf = gpd.GeoDataFrame(box, crs='EPSG:4326')
bbox_gdf.bounds
Out[2]:
minx miny maxx maxy
0 -79.317663 31.036502 -70.760165 36.541466

Our Data¶

AWOIS Wrecks & Electronic Nautical Charts¶

This data comes from the Wrecks and Obstructions Database, managed by the Office of Coast Survey, as well as from Electronic Nautical Charts (ENC's are what the AWOIS database moved towards in 2016). There are over 10,000 wrecks and obstructions managed through this database. We got this data as an excel spreadsheet initially, then decided to use the KMZ (KML) file that was also avialble for consisitency with the Reef Data.

The geodataframe shows our georeferenced locations as well as any potential size and other information about the object. In the following scatter plot are the locations of various wrecks in our region of interest. Our bounding box coordinates run along our x and y axes. It can be a little difficult to gain perspective when seeing these dots without a map in the background, which is why we also created that folium map.

In [3]:
# ENC Wrecks gdf

ENC_gdf = read_kml(ENC_wrecks_kml_pth)
ax = ENC_gdf.plot()
ax.set_xlabel("Longitude")
ax.set_ylabel("Latitude")
ax.set_title('ENC Wrecks')

# Convert ENC gdf to refined df
ENC_df = create_refined_df(ENC_gdf)
ENC_df['description'] = 'ENC Wreck'
No description has been provided for this image

Artificial Reefs and Oyster Sanctuaries¶

The North Carolina Division of Marine Fisheries maintains 43 ocean artificial reefs and 25 estuarine reefs. These reefs help promote ecological balance and provide homes for many wild and farmed animals. The materials for these reefs range from sunken vessels and concrete pipes to concrete structures and reef balls- little habitats designed to act as an artificial reef (North Carolina Department of Environmental Quality). We brought this point data into our project as a KML.

A Reef Ball provides habitat for dozens of types of marine life:

reef_image.png

image credit: Wikipedia commons

The map below shows point data for the artificial reefs, confirmed wrecks from electric nautical charts, as well as wrecks and obstructions from the AWOIS database in our area of interest. The interactive quality of this map allows the user to zoom in further and further on clusters of objects until identifying information about the point displays. This data is what we can use to see how our inferential data such as currents and sediment are impacting physical ts.

In [4]:
total_df = pd.concat([sc_reef_df, nc_reef_df, AWOIS_df, ENC_df])


# Map our concatonated dataframe
m = folium.Map(location=[32.087495, -71.511922], zoom_start=6)

marker_cluster = MarkerCluster().add_to(m)

for index, row in total_df.iterrows():
    folium.Marker(
        location=[row.lat, row.lon],
        popup=row.description,
        icon=folium.Icon(color="black")
    ).add_to(marker_cluster)

m
Out[4]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Bathymetry & Waves¶

Bathymetry is the study of the measurements of depth of water on ocean floors. One can think of it as underwater, 3-D cartography. This study gives us information about depth and terrain and is very useful for producing nautical charts (oceanservice.noaa.gov). Understanding the currents and the strength of the waves- and therefore impact on the objects below- helps our model interpret activity in the ocean's depths. While all of our data until now has been point data, this is more continuous data. Our bathymetric data is open-sourced government terrain data. It 1-km resolution land surface digital elevation model from USGS.

Below is an example bathymetric map. They resemble heat maps.

bathymetry_image

image credit: Ardhuin, Fabrice & Herbers, T & Vledder, Gerbrant & Watts, Kristen & Jensen, R. & Graber, Hans. (2015). Slanting-Fetch-JPO-2007-Ardhuin-etal.pdf.

Below are our bathymetry data frames that were then joined and clipped to our bounding box.

In [5]:
bathymetry_path = os.path.join(et.io.HOME, 'seafloor-objects', 'data', 'bathymetry', 'bathymetry1.tif')
bathymetry_path_2 = os.path.join(et.io.HOME, 'seafloor-objects', 'data', 'bathymetry','bathymetry2.tif')

bathyemtry1_df = open_tif_to_df(bathymetry_path, "bathymetry")
bathyemtry2_df = open_tif_to_df(bathymetry_path_2, "bathymetry2")
In [6]:
bathyemtry_df = pd.concat([bathyemtry1_df, bathyemtry2_df], axis=1, join='inner')
bathyemtry_df
Out[6]:
spatial_ref bathymetry spatial_ref bathymetry2
band y x
1 36.545 -76.125 0 1.0 0 1.97
-76.075 0 6.0 0 1.97
-76.025 0 -2.0 0 1.97
-75.975 0 1.0 0 2.09
-75.925 0 5.0 0 2.32
... ... ... ... ... ...
32.045 -74.725 0 -4436.0 0 7.61
-74.675 0 -4502.0 0 7.67
-74.625 0 -4585.0 0 7.74
-74.575 0 -4650.0 0 7.81
-74.525 0 -4706.0 0 7.88

6794 rows × 4 columns

In [7]:
clipped_bathyemtry_gdf = clip_gdf(bathyemtry_df)
clipped_bathyemtry_gdf
Out[7]:
band y x spatial_ref bathymetry spatial_ref bathymetry2 geometry
6706 1 32.045 -78.875 0 -337.0 0 6.65 POINT (-78.87500 32.04500)
6707 1 32.045 -78.825 0 -360.0 0 6.61 POINT (-78.82500 32.04500)
6708 1 32.045 -78.775 0 -358.0 0 6.57 POINT (-78.77500 32.04500)
6709 1 32.045 -78.725 0 -351.0 0 6.53 POINT (-78.72500 32.04500)
6710 1 32.045 -78.675 0 -388.0 0 6.49 POINT (-78.67500 32.04500)
... ... ... ... ... ... ... ... ...
91 1 36.445 -74.775 0 -133.0 0 6.30 POINT (-74.77500 36.44500)
59 1 36.495 -74.775 0 -93.0 0 6.27 POINT (-74.77500 36.49500)
61 1 36.495 -74.675 0 -740.0 0 6.53 POINT (-74.67500 36.49500)
62 1 36.495 -74.625 0 -1271.0 0 6.67 POINT (-74.62500 36.49500)
60 1 36.495 -74.725 0 -277.0 0 6.40 POINT (-74.72500 36.49500)

5645 rows × 8 columns

In [8]:
clipped_bathyemtry_df = pd.DataFrame(
        columns=[
            'lat', 'lon', 'bathymetry', 'bathymetry2'])
clipped_bathyemtry_df['lon'] = clipped_bathyemtry_gdf.geometry.x
clipped_bathyemtry_df['lat'] = clipped_bathyemtry_gdf.geometry.y
clipped_bathyemtry_df['bathymetry'] = clipped_bathyemtry_gdf['bathymetry']
clipped_bathyemtry_df['bathymetry2'] = clipped_bathyemtry_gdf['bathymetry2']
clipped_bathyemtry_df
Out[8]:
lat lon bathymetry bathymetry2
6706 32.045 -78.875 -337.0 6.65
6707 32.045 -78.825 -360.0 6.61
6708 32.045 -78.775 -358.0 6.57
6709 32.045 -78.725 -351.0 6.53
6710 32.045 -78.675 -388.0 6.49
... ... ... ... ...
91 36.445 -74.775 -133.0 6.30
59 36.495 -74.775 -93.0 6.27
61 36.495 -74.675 -740.0 6.53
62 36.495 -74.625 -1271.0 6.67
60 36.495 -74.725 -277.0 6.40

5645 rows × 4 columns

Marine Mishaps¶

This data part of the MISLE (Marine Information for Safety and Law Enforcement) database and is maintained by the United State Coast Guard and covers the broad realm of marine accidental and deliberate pollution, marine casualties, as well as a host of other types of shipping and port accidents within United State's waters (USGC, US Coast Guard Marine Safety Management System (MSMS), 2008 (now MISLE). This database helps catch wreck that the AWOIS/ENC databases might not have.

When we brought in the marine_mishaps data we realized that it was gridded data and not conformed to lat/long like our other datasets. Below you can notice that the x and y values on the dataframe are clearly not latitude and longitude. This is something we have no experience with and were therefore not able to use it in our training data frame; we simply didn't have the time to figure out how to translate that style into coordinates. This was a lesson to us about how sometimes data just doesn't work with a project or other data sets.

In [9]:
marine_mishaps_path = os.path.join(et.io.HOME, 'seafloor-objects', 'data', 'marine_mishaps', 'marine_mishaps1.tif')

marine_mishaps_df = open_tif_to_df(marine_mishaps_path, 'marine_mishaps')
marine_mishaps_df = marine_mishaps_df.reset_index()

final_marine_mishaps_df = pd.DataFrame(columns=['x', 'y', 'marine_mishaps'])

final_marine_mishaps_df['x'] = marine_mishaps_df['x']
final_marine_mishaps_df['y'] = marine_mishaps_df['y']
final_marine_mishaps_df['marine_mishaps'] = marine_mishaps_df['marine_mishaps']
final_marine_mishaps_df

# NOTE: these x,y values do not seem to be valid latitude and longitude?
Out[9]:
x y marine_mishaps
0 490.5 453.5 0.000000
1 491.5 453.5 0.000000
2 492.5 453.5 0.000000
3 493.5 453.5 0.000000
4 494.5 453.5 0.000000
... ... ... ...
162040 640.5 0.5 -422.718211
162041 641.5 0.5 -427.369704
162042 642.5 0.5 -432.028833
162043 643.5 0.5 -436.695355
162044 644.5 0.5 -441.369038

162045 rows × 3 columns

Nightlights Fishing Lights¶

"Nightlights" is data that is being closely tracked by scientists in various marine fields around the world right now. This data helps us track where shipping vessels are and especially helps us when they go "dark"- that is when they go off-radar. Many illegal dumping and fishing fleets have been caught with this type of data. This data is captured by the Suomi National Polar-orbiting Partnership satellite which carries an extremely sensitive camera, the Visible Infrared Imaging Radiometer Suite (VIIRS), which images the entire earth’s surface every night (Global Fishing Watch, 2023).

We have a large range of Night Light data frames that we were successfully able to concatenate together, which helped us create a Folium map of all of the Night Lights detected in the region in our bounding box. This data set is a series of tif's that were concatenated into one large data frame with an inner join (shown below). All 11 of the initial frames are now in individual columns attached to our bounding box coordinates. From this one large nightlights data frame we were able to create an interactive map of all the vessels that are traversing the area after dark.

Some of these these tifs when converted to datrames had ~17,518,801 lines. Once we dropped NA values, did the inner join, and clipped to our bounding box we were left with only 43 lines.

night_lights_image image credit: Wikipedia commons

In [11]:
clipped_night_lights_df
Out[11]:
lat lon night_lights_11 night_lights_10 night_lights_9 night_lights_8 night_lights_7 night_lights_6 night_lights_5 night_lights_4 night_lights_3 night_lights_2 night_lights_1 night_lights_0
1274 33.766667 -78.116667 151.0 1.0 0.662252 1.798955 1.219512 2.469136 4.0 162.0 5.986400 9.390680 2.0 164.0
1272 33.766667 -78.129167 186.0 1.0 0.537634 2.716830 1.242236 0.613497 1.0 163.0 4.347810 3.493350 2.0 161.0
1275 33.766667 -78.112500 180.0 2.0 1.111111 5.788855 2.439024 3.048780 5.0 164.0 4.403730 3.298330 4.0 164.0
1269 33.770833 -78.116667 156.0 4.0 2.564103 2.785440 5.161290 8.536586 14.0 164.0 3.702729 6.649830 8.0 155.0
1268 33.770833 -78.120833 171.0 2.0 1.169591 2.983884 3.048780 4.848485 8.0 165.0 4.723375 6.417840 5.0 164.0
1267 33.770833 -78.129167 175.0 1.0 0.571429 5.039730 0.632911 0.584795 1.0 171.0 5.402920 2.577010 1.0 158.0
1270 33.770833 -78.112500 155.0 1.0 0.645161 5.721062 5.263158 5.194805 8.0 154.0 2.589825 1.724290 9.0 171.0
1271 33.770833 -78.108333 149.0 3.0 2.013423 3.398130 2.702703 1.875000 3.0 160.0 4.667990 3.453410 4.0 148.0
1264 33.775000 -78.112500 170.0 7.0 4.117647 4.695288 4.137931 2.500000 4.0 160.0 3.443953 3.956602 6.0 145.0
1266 33.775000 -78.104167 163.0 4.0 2.453988 14.539200 0.684932 0.662252 1.0 151.0 5.736020 5.295053 1.0 146.0
1265 33.775000 -78.108333 169.0 1.0 0.591716 3.365845 2.531646 2.395210 4.0 167.0 5.020945 4.153960 4.0 158.0
1263 33.775000 -78.116667 158.0 4.0 2.531646 4.783703 4.285714 5.405406 8.0 148.0 3.272580 3.042767 6.0 140.0
1262 33.779167 -78.112500 152.0 4.0 2.631579 4.537225 1.250000 0.625000 1.0 160.0 4.622490 3.012492 2.0 160.0
1261 33.779167 -78.120833 154.0 1.0 0.649351 2.091390 0.641026 0.657895 1.0 152.0 3.779270 2.133860 1.0 156.0
1259 34.620833 -76.545833 181.0 2.0 1.104972 1.419470 0.649351 2.424242 4.0 165.0 2.353261 4.291419 1.0 154.0
1258 34.625000 -76.545833 155.0 3.0 1.935484 2.488250 2.409639 2.484472 4.0 161.0 2.433551 4.476250 4.0 166.0
1253 34.629167 -76.666667 143.0 3.0 2.097902 1.457000 0.606061 0.561798 1.0 178.0 0.926423 2.515010 1.0 165.0
1254 34.629167 -76.662500 149.0 3.0 2.013423 3.243350 1.250000 2.395210 4.0 167.0 4.313485 6.208063 2.0 160.0
1257 34.629167 -76.541667 144.0 1.0 0.694444 1.608288 2.439024 0.763359 1.0 131.0 1.006730 1.130850 4.0 164.0
1256 34.629167 -76.545833 154.0 2.0 1.298701 4.203930 2.054795 3.636364 6.0 165.0 2.029050 3.613420 3.0 146.0
1255 34.629167 -76.658333 170.0 2.0 1.176471 1.752170 1.149425 1.190476 2.0 168.0 5.904060 5.275055 2.0 174.0
1250 34.633333 -76.662500 152.0 5.0 3.289474 3.089263 1.438849 5.194805 8.0 154.0 4.062875 8.129422 2.0 139.0
1251 34.633333 -76.650000 168.0 3.0 1.785714 1.708440 0.581395 0.662252 1.0 151.0 3.089410 2.613020 1.0 172.0
1252 34.633333 -76.641667 173.0 2.0 1.156069 1.533820 0.613497 2.684564 4.0 149.0 7.465155 6.197050 1.0 163.0
1249 34.637500 -76.666667 164.0 1.0 0.609756 0.963250 0.613497 0.621118 1.0 161.0 1.994900 1.814420 1.0 163.0
1248 34.637500 -76.691667 175.0 1.0 0.571429 5.231735 1.342282 0.591716 1.0 169.0 1.433960 2.131550 2.0 149.0
1247 34.637500 -76.704167 140.0 1.0 0.714286 4.654010 0.625000 0.602410 1.0 166.0 0.881691 1.153210 1.0 160.0
1246 34.662500 -76.620833 170.0 1.0 0.588235 1.820485 1.190476 0.571429 1.0 175.0 1.008930 1.833460 2.0 168.0
1245 34.662500 -76.641667 146.0 1.0 0.684932 5.059670 1.226994 1.428571 2.0 140.0 0.963060 1.243540 2.0 163.0
1241 35.075000 -76.233333 143.0 1.0 0.699301 1.348180 0.694444 0.680272 1.0 147.0 3.390600 2.383630 1.0 144.0
1240 35.108333 -76.212500 161.0 1.0 0.621118 1.097730 0.729927 0.724638 1.0 138.0 4.485700 1.722900 1.0 137.0
1244 34.970833 -76.691667 161.0 1.0 0.621118 2.693530 1.388889 0.584795 1.0 171.0 1.307070 9.872280 2.0 144.0
1243 34.983333 -76.841667 145.0 2.0 1.379310 1.216530 0.625000 0.699301 1.0 143.0 1.565910 1.261425 1.0 160.0
1242 34.987500 -76.683333 161.0 1.0 0.621118 1.594133 1.169591 0.671141 1.0 149.0 0.519020 1.332450 2.0 171.0
1238 35.245833 -75.983333 144.0 1.0 0.694444 4.325350 0.662252 0.606061 1.0 165.0 0.805869 2.607260 1.0 151.0
1236 35.370833 -75.995833 158.0 1.0 0.632911 2.152510 0.675676 0.649351 1.0 154.0 0.626740 1.581050 1.0 148.0
1234 36.454167 -75.825000 163.0 1.0 0.613497 6.238815 1.408451 0.628931 1.0 159.0 0.980255 1.546770 2.0 142.0
1239 35.241667 -75.437500 173.0 1.0 0.578035 3.247720 0.645161 0.625000 1.0 160.0 1.212550 0.968463 1.0 155.0
1237 35.295833 -75.483333 153.0 1.0 0.653595 0.728569 0.649351 0.598802 1.0 167.0 1.858990 0.973554 1.0 154.0
1235 35.779167 -75.545833 168.0 7.0 4.166667 1.015940 0.595238 1.935484 3.0 155.0 2.316347 3.551703 1.0 168.0
1277 33.758333 -78.120833 152.0 1.0 0.657895 4.296430 0.617284 0.729927 1.0 137.0 9.000360 7.137050 1.0 162.0
1276 33.758333 -78.125000 174.0 1.0 0.574713 1.493260 0.649351 1.242236 2.0 161.0 1.791252 1.315410 1.0 154.0
1273 33.766667 -78.120833 163.0 3.0 1.840491 1.516883 1.818182 0.645161 1.0 155.0 10.077600 4.415436 3.0 165.0
In [13]:
# night lights folium map
m1
Out[13]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Currents & Waves/dbSEABED Data:¶

Both of these datasets are supplied by INSTAAR, a CU Boulder-based collaborative. They aim to combine a large range of seabed data to provide ocean-bottom information. The dbSEABED data system contains data about seabed texture, composition, acoustic properties, color, geology, and biology (Instaar.colorado.edu, 2007). Data from them can be pointwise, rasterized (cell-wise), formatted as ESRI grids, or Geotiffs. This collective aims to provide easy-to-access KML versions of their data that can be easily viewed on Google Earth (their download links are currently not active). This data would act as our prediction data in a machine learning model, which means that it is data that would be provided to the model. That model is trained by our other joined data frame which is comprised of all the other data above. This data frame is known as the training data.

Below we have an example of cleaning data and some methods that may help clean up a data set to make imachine-learning-readyd In this example, we are using the dbSEABED data.y. The ocess of cleaning data is complex and there is no one fits all approach as every data set is different, but these are some great steps. We reset the index first to make sure we have a defined index, then we drop any duplicate entries and fill in na values. We also produce a z-score for each value that is relative to the mean and standard deviation of the dataset. We also show some scaling with the MinMaxScaler(), a function from sci-kit that helps prepare data for sci-kit learn models. This scales data to a fixed range. In this dataset, it is not terribly important because it is one set, but once data sets have been merged this is a very useful tool.

Notice in the first cell below that the length of the frame is almost 6500 lines. After performing some data cleaning our shape gets cut down to 960 rows.

In [14]:
db_path_2 = os.path.join(et.io.HOME, 'seafloor-objects', 'data', 'dbseabed_materials', 'dbseabed_materials2.tif')
db_2_df = open_tif_to_df(db_path_2, 'dbseabed_materials2')
db_2_df.shape
Out[14]:
(6474, 2)
In [15]:
# Cleaning up our data
db_2_df.reset_index()
db_2_df.duplicated().sum()
db_2_df.drop_duplicates(inplace=True)

db_2_df['dbseabed_materials2'].unique()
db_2_df['dbseabed_materials2'].replace('Unknown', np.nan, inplace=True) 

db_2_df['z_score'] = zscore(db_2_df['dbseabed_materials2'])
db_2_df = db_2_df.loc[db_2_df['z_score'].abs() < 3]

# remove z-score column
db_2_df.drop('z_score', axis=1, inplace=True)

db_2_df.fillna(db_2_df.mean(), inplace=True)

scaler = MinMaxScaler()
db_2_df[['dbseabed_materials2', 'spatial_ref']] = scaler.fit_transform(db_2_df[['dbseabed_materials2', 'spatial_ref']])

db_2_df.shape
Out[15]:
(960, 2)

Combining our Data¶

We were able to successfully concatenate some of our training data, as well as most of our prediction data into large training and prediction data frames where each dataset is referenced by one corresponding coordinate value. The final dimensions of the training and prediction data frames were (43, 18) and (662, 10) respectively. These datasets need to be the same size to be used in machine learning and these sets are off. Furthermore, our nightlights data became so small when removing nan values that the datasets do not seem like they would contain enough corresponding information to receive a useful value from our model.

In [16]:
training_data_df = pd.concat([clipped_bathyemtry_df, clipped_night_lights_df], axis=1, join='inner')
training_data_df
Out[16]:
lat lon bathymetry bathymetry2 lat lon night_lights_11 night_lights_10 night_lights_9 night_lights_8 night_lights_7 night_lights_6 night_lights_5 night_lights_4 night_lights_3 night_lights_2 night_lights_1 night_lights_0
1236 34.995 -76.875 -4.0 3.10 35.370833 -75.995833 158.0 1.0 0.632911 2.152510 0.675676 0.649351 1.0 154.0 0.626740 1.581050 1.0 148.0
1237 34.995 -76.825 9.0 3.21 35.295833 -75.483333 153.0 1.0 0.653595 0.728569 0.649351 0.598802 1.0 167.0 1.858990 0.973554 1.0 154.0
1239 34.995 -76.725 -4.0 3.42 35.241667 -75.437500 173.0 1.0 0.578035 3.247720 0.645161 0.625000 1.0 160.0 1.212550 0.968463 1.0 155.0
1240 34.995 -76.675 -4.0 3.52 35.108333 -76.212500 161.0 1.0 0.621118 1.097730 0.729927 0.724638 1.0 138.0 4.485700 1.722900 1.0 137.0
1238 34.995 -76.775 7.0 3.31 35.245833 -75.983333 144.0 1.0 0.694444 4.325350 0.662252 0.606061 1.0 165.0 0.805869 2.607260 1.0 151.0
1241 34.995 -76.625 1.0 3.63 35.075000 -76.233333 143.0 1.0 0.699301 1.348180 0.694444 0.680272 1.0 147.0 3.390600 2.383630 1.0 144.0
1243 34.995 -76.525 4.0 3.84 34.983333 -76.841667 145.0 2.0 1.379310 1.216530 0.625000 0.699301 1.0 143.0 1.565910 1.261425 1.0 160.0
1242 34.995 -76.575 -2.0 3.73 34.987500 -76.683333 161.0 1.0 0.621118 1.594133 1.169591 0.671141 1.0 149.0 0.519020 1.332450 2.0 171.0
1246 34.995 -76.375 -2.0 3.82 34.662500 -76.620833 170.0 1.0 0.588235 1.820485 1.190476 0.571429 1.0 175.0 1.008930 1.833460 2.0 168.0
1245 34.995 -76.425 1.0 3.85 34.662500 -76.641667 146.0 1.0 0.684932 5.059670 1.226994 1.428571 2.0 140.0 0.963060 1.243540 2.0 163.0
1244 34.995 -76.475 1.0 3.88 34.970833 -76.691667 161.0 1.0 0.621118 2.693530 1.388889 0.584795 1.0 171.0 1.307070 9.872280 2.0 144.0
1247 34.995 -76.325 2.0 3.79 34.637500 -76.704167 140.0 1.0 0.714286 4.654010 0.625000 0.602410 1.0 166.0 0.881691 1.153210 1.0 160.0
1248 34.995 -76.275 -2.0 3.76 34.637500 -76.691667 175.0 1.0 0.571429 5.231735 1.342282 0.591716 1.0 169.0 1.433960 2.131550 2.0 149.0
1250 34.995 -76.175 -2.0 3.70 34.633333 -76.662500 152.0 5.0 3.289474 3.089263 1.438849 5.194805 8.0 154.0 4.062875 8.129422 2.0 139.0
1249 34.995 -76.225 -2.0 3.73 34.637500 -76.666667 164.0 1.0 0.609756 0.963250 0.613497 0.621118 1.0 161.0 1.994900 1.814420 1.0 163.0
1252 34.995 -76.075 -14.0 3.64 34.633333 -76.641667 173.0 2.0 1.156069 1.533820 0.613497 2.684564 4.0 149.0 7.465155 6.197050 1.0 163.0
1254 34.995 -75.975 -20.0 3.71 34.629167 -76.662500 149.0 3.0 2.013423 3.243350 1.250000 2.395210 4.0 167.0 4.313485 6.208063 2.0 160.0
1255 34.995 -75.925 -24.0 3.91 34.629167 -76.658333 170.0 2.0 1.176471 1.752170 1.149425 1.190476 2.0 168.0 5.904060 5.275055 2.0 174.0
1253 34.995 -76.025 -18.0 3.61 34.629167 -76.666667 143.0 3.0 2.097902 1.457000 0.606061 0.561798 1.0 178.0 0.926423 2.515010 1.0 165.0
1256 34.995 -75.875 -23.0 4.11 34.629167 -76.545833 154.0 2.0 1.298701 4.203930 2.054795 3.636364 6.0 165.0 2.029050 3.613420 3.0 146.0
1258 34.995 -75.775 -23.0 4.51 34.625000 -76.545833 155.0 3.0 1.935484 2.488250 2.409639 2.484472 4.0 161.0 2.433551 4.476250 4.0 166.0
1257 34.995 -75.825 -22.0 4.31 34.629167 -76.541667 144.0 1.0 0.694444 1.608288 2.439024 0.763359 1.0 131.0 1.006730 1.130850 4.0 164.0
1251 34.995 -76.125 -8.0 3.67 34.633333 -76.650000 168.0 3.0 1.785714 1.708440 0.581395 0.662252 1.0 151.0 3.089410 2.613020 1.0 172.0
1275 34.995 -74.925 -2696.0 7.34 33.766667 -78.112500 180.0 2.0 1.111111 5.788855 2.439024 3.048780 5.0 164.0 4.403730 3.298330 4.0 164.0
1274 34.995 -74.975 -2523.0 7.26 33.766667 -78.116667 151.0 1.0 0.662252 1.798955 1.219512 2.469136 4.0 162.0 5.986400 9.390680 2.0 164.0
1273 34.995 -75.025 -2336.0 7.14 33.766667 -78.120833 163.0 3.0 1.840491 1.516883 1.818182 0.645161 1.0 155.0 10.077600 4.415436 3.0 165.0
1272 34.995 -75.075 -1877.0 6.98 33.766667 -78.129167 186.0 1.0 0.537634 2.716830 1.242236 0.613497 1.0 163.0 4.347810 3.493350 2.0 161.0
1271 34.995 -75.125 -1084.0 6.82 33.770833 -78.108333 149.0 3.0 2.013423 3.398130 2.702703 1.875000 3.0 160.0 4.667990 3.453410 4.0 148.0
1270 34.995 -75.175 -669.0 6.65 33.770833 -78.112500 155.0 1.0 0.645161 5.721062 5.263158 5.194805 8.0 154.0 2.589825 1.724290 9.0 171.0
1264 34.995 -75.475 -46.0 5.68 33.775000 -78.112500 170.0 7.0 4.117647 4.695288 4.137931 2.500000 4.0 160.0 3.443953 3.956602 6.0 145.0
1266 34.995 -75.375 -72.0 6.01 33.775000 -78.104167 163.0 4.0 2.453988 14.539200 0.684932 0.662252 1.0 151.0 5.736020 5.295053 1.0 146.0
1265 34.995 -75.425 -57.0 5.84 33.775000 -78.108333 169.0 1.0 0.591716 3.365845 2.531646 2.395210 4.0 167.0 5.020945 4.153960 4.0 158.0
1263 34.995 -75.525 -37.0 5.50 33.775000 -78.116667 158.0 4.0 2.531646 4.783703 4.285714 5.405406 8.0 148.0 3.272580 3.042767 6.0 140.0
1262 34.995 -75.575 -34.0 5.30 33.779167 -78.112500 152.0 4.0 2.631579 4.537225 1.250000 0.625000 1.0 160.0 4.622490 3.012492 2.0 160.0
1276 34.995 -74.875 -2729.0 7.43 33.758333 -78.125000 174.0 1.0 0.574713 1.493260 0.649351 1.242236 2.0 161.0 1.791252 1.315410 1.0 154.0
1277 34.995 -74.825 -2777.0 7.51 33.758333 -78.120833 152.0 1.0 0.657895 4.296430 0.617284 0.729927 1.0 137.0 9.000360 7.137050 1.0 162.0
1259 34.995 -75.725 -26.0 4.71 34.620833 -76.545833 181.0 2.0 1.104972 1.419470 0.649351 2.424242 4.0 165.0 2.353261 4.291419 1.0 154.0
1261 34.995 -75.625 -29.0 5.10 33.779167 -78.120833 154.0 1.0 0.649351 2.091390 0.641026 0.657895 1.0 152.0 3.779270 2.133860 1.0 156.0
1268 34.995 -75.275 -176.0 6.33 33.770833 -78.120833 171.0 2.0 1.169591 2.983884 3.048780 4.848485 8.0 165.0 4.723375 6.417840 5.0 164.0
1269 34.995 -75.225 -384.0 6.49 33.770833 -78.116667 156.0 4.0 2.564103 2.785440 5.161290 8.536586 14.0 164.0 3.702729 6.649830 8.0 155.0
1267 34.995 -75.325 -99.0 6.17 33.770833 -78.129167 175.0 1.0 0.571429 5.039730 0.632911 0.584795 1.0 171.0 5.402920 2.577010 1.0 158.0
1234 34.995 -76.975 12.0 2.89 36.454167 -75.825000 163.0 1.0 0.613497 6.238815 1.408451 0.628931 1.0 159.0 0.980255 1.546770 2.0 142.0
1235 34.995 -76.925 -4.0 3.00 35.779167 -75.545833 168.0 7.0 4.166667 1.015940 0.595238 1.935484 3.0 155.0 2.316347 3.551703 1.0 168.0
In [18]:
prediction_df2 = pd.concat(
    [cw_3_df, cw_4_df, cw_5_df, db_1_df, db_3_df, db_4_df, db_5_df, db_6_df],
    axis=1, join='inner'
)
prediction_df2
Out[18]:
spatial_ref currents_and_waves3 spatial_ref currents_and_waves4 spatial_ref currents_and_waves5 spatial_ref dbseabed_materials1 spatial_ref dbseabed_materials3 spatial_ref dbseabed_materials4 spatial_ref dbseabed_materials5 spatial_ref dbseabed_materials6
band y x
1 36.545 -76.025 0 -0.01 0 0.01 0 0.00 0 10.4 0 56.4 0 2.46 0 0.0 0 0.78
-75.825 0 -0.01 0 0.06 0 0.63 0 26.2 0 70.7 0 0.30 0 0.0 0 1.60
-75.775 0 -0.03 0 0.14 0 0.57 0 1.5 0 100.0 0 1.50 0 0.0 0 0.50
-75.725 0 -0.03 0 0.14 0 0.63 0 1.5 0 100.0 0 1.50 0 0.0 0 0.50
-75.675 0 -0.03 0 0.16 0 0.75 0 1.5 0 100.0 0 1.50 0 0.0 0 0.50
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
32.045 -74.725 0 -0.01 0 -0.06 0 0.00 0 13.2 0 41.0 0 3.22 0 0.0 0 0.76
-74.675 0 -0.02 0 -0.12 0 0.00 0 11.6 0 45.5 0 3.05 0 0.0 0 0.80
-74.625 0 -0.02 0 -0.12 0 0.00 0 7.4 0 41.2 0 3.31 0 0.0 0 0.73
-74.575 0 -0.01 0 -0.15 0 0.00 0 7.4 0 42.9 0 3.24 0 0.0 0 0.73
-74.525 0 -0.01 0 -0.15 0 0.00 0 12.7 0 59.3 0 2.25 0 0.0 0 0.86

6245 rows × 16 columns

In [19]:
clipped_prediction_gdf2 = clip_gdf(prediction_df2)
clipped_prediction_gdf2
Out[19]:
band y x spatial_ref currents_and_waves3 spatial_ref currents_and_waves4 spatial_ref currents_and_waves5 spatial_ref dbseabed_materials1 spatial_ref dbseabed_materials3 spatial_ref dbseabed_materials4 spatial_ref dbseabed_materials5 spatial_ref dbseabed_materials6 geometry
6159 1 32.045 -78.775 0 -0.07 0 -0.06 0 0.00 0 41.9 0 51.5 0 0.01 0 3.77 0 1.25 POINT (-78.77500 32.04500)
6157 1 32.045 -78.875 0 -0.06 0 -0.10 0 0.00 0 24.8 0 64.3 0 0.74 0 0.00 0 1.25 POINT (-78.87500 32.04500)
6158 1 32.045 -78.825 0 -0.07 0 -0.06 0 0.00 0 45.8 0 47.3 0 -0.04 0 4.84 0 1.29 POINT (-78.82500 32.04500)
6029 1 32.095 -78.825 0 -0.07 0 -0.06 0 0.00 0 21.8 0 65.0 0 1.01 0 0.00 0 1.28 POINT (-78.82500 32.09500)
6030 1 32.095 -78.775 0 -0.07 0 -0.06 0 0.00 0 19.6 0 69.9 0 0.95 0 0.00 0 1.17 POINT (-78.77500 32.09500)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
70 1 36.445 -75.375 0 -0.02 0 0.16 0 1.14 0 1.5 0 100.0 0 1.50 0 0.00 0 0.50 POINT (-75.37500 36.44500)
68 1 36.445 -75.475 0 -0.02 0 0.16 0 1.06 0 15.5 0 72.0 0 1.50 0 0.00 0 0.50 POINT (-75.47500 36.44500)
71 1 36.445 -75.325 0 -0.02 0 0.16 0 1.11 0 100.0 0 1.5 0 -2.00 0 0.00 0 2.40 POINT (-75.32500 36.44500)
39 1 36.495 -75.425 0 -0.02 0 0.16 0 1.04 0 1.5 0 100.0 0 1.50 0 0.00 0 0.50 POINT (-75.42500 36.49500)
40 1 36.495 -75.375 0 -0.02 0 0.16 0 1.32 0 5.9 0 91.2 0 1.50 0 0.00 0 0.50 POINT (-75.37500 36.49500)

5306 rows × 20 columns

In [21]:
clipped_prediction_df
Out[21]:
lat lon currents_and_waves3 currents_and_waves4 currents_and_waves5 dbseabed_materials3 dbseabed_materials4 dbseabed_materials5 dbseabed_materials6
6159 32.045 -78.775 -0.07 -0.06 0.00 51.5 0.01 3.77 1.25
6157 32.045 -78.875 -0.06 -0.10 0.00 64.3 0.74 0.00 1.25
6158 32.045 -78.825 -0.07 -0.06 0.00 47.3 -0.04 4.84 1.29
6029 32.095 -78.825 -0.07 -0.06 0.00 65.0 1.01 0.00 1.28
6030 32.095 -78.775 -0.07 -0.06 0.00 69.9 0.95 0.00 1.17
... ... ... ... ... ... ... ... ... ...
70 36.445 -75.375 -0.02 0.16 1.14 100.0 1.50 0.00 0.50
68 36.445 -75.475 -0.02 0.16 1.06 72.0 1.50 0.00 0.50
71 36.445 -75.325 -0.02 0.16 1.11 1.5 -2.00 0.00 2.40
39 36.495 -75.425 -0.02 0.16 1.04 100.0 1.50 0.00 0.50
40 36.495 -75.375 -0.02 0.16 1.32 91.2 1.50 0.00 0.50

5306 rows × 9 columns

NearestNeighbor¶

Once data is conformed, two large sets of data are merged to perform machine learning on. One set of data trains the model, the other set is what we look for correlation in. These are known as the "training data" and "testing data." The NearestNeighbor Analysis from ScikitLearn is a great starter machine learning model for this situation because this analysis measures the spread or distribution of something over a geographical space. We were not able to use our datasets for this task but we ran an example. Below is a sample code for a machine learning model from sci-kit Learn. This is data pre-wrangled and available through sci-kit learn to perform practice machine learning models on.

X_train and X_test refer to our training data that the model will use to learn from. y_train and y_test refer to the prediction data that corresponds to the aforementioned 'X' data. When these 2 interact the predicted values (y) are compared against the trained values, 'X.' We picked an average-sized 'k.' k os a parameter that represents the number of neighbors to consider when making predictions. A smaller 'k' may be sensitive to noise, while a larger 'k' can lead to bias. Once we run the algorithm we can predict accuracy and see that in this dataset the accuracy is 1.0. This means that the model achieved perfect classification on the test data. This is great, but a score like this in a real-data setting may be too good to be true and may raise cause for suspicion. It may indicate data issues. We would expect our data to have a significantly lower accuracy score.

In [22]:
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import HistGradientBoostingClassifier

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Target labels

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a k-NN classifier
k = 3  # Number of neighbors
knn_classifier = KNeighborsClassifier(n_neighbors=k)

# Train the model
knn_classifier.fit(X_train, y_train)

# Make predictions on the test data
y_pred = knn_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
Accuracy: 1.00

Docker¶

One of the first issues we encountered during this project was hardware issues. One of the computers we were using could not solve the environment (one of us works on Mac, the other on Windows) and so we had to build a docker container. Docker is a lightweight, standalone software package that includes everything needed to essentially act as a virtual computer. An initial image is built that contains environment details and then a container is run from it. Docker provides a consistent and reliable way to run applications across different environments and is very popular for software testing and development. This meant we had to learn the ins and outs of troubleshooting that platform, which added a layer of complexity to our project.

Once we had all of our data imported into our notebook, we began to have memory issues. We have so much different data that concatenating them takes a huge amount of memory and caused issues with the docker container. It became very difficult to run the notebook all the way through and it took quite a bit of constant finesse to keep the computer and container running.

Below is an image of our Docker image commands.

image.png

Final Thoughts¶

The ocean is a massively important place for us as a planet. Due to its size and depth, tracking what happens on the seafloor has traditionally been very hard. Having cutting-edge methods such as the Nearest Neighbor model will help us to develop environmental models that can aid in sustainable fishing practices, enhance our navigational safety, help us understand climate change impacts, and so much more. Just last year a report came out that stated that a glacier machine learning model- the Instructed Glacier Model- was developed to predict the evolution of glaciers and ice sheets up to 1000 times faster than any previous methods (Columbia Climate School, 2022). This extreme increase in speed is due to the fact that the heavy physics that is involved in modeling is handled by AI and therefore can be processed much faster. This model will greatly enhance our ability to predict sea level rise and is only one of many examples of how this type of technology may positively impact our society.

Our particular project was wrought with problems and had many frustrations attached to it. We learned the hard way near the end of the term that we had made very lofty goals and would not be able to complete our initial mission: to employ a machine-learning model that analyzes what may happen to sunken objects. We had so many varying data sets and so much complex data cleaning and manipulation to do. Our project really became a lesson on how to download and manipulate various types of data frames. This summer was a fascinating and frustrating delve into cloud-based computing, data wrangling, machine learning, and ocean floor investigation. It wasn't how we saw our project going, but disappointment aside, it is all too often what happens in science. To quote an American hero and famed scientist Thomas Edison, "I haven't failed, I've found 10,000 ways that don't work."

Want to learn more?¶

Please visit our git hub repository for more technical information and to view our software:

https://github.com/rmarowitz/seafloor-objects¶

You may also contact us at:

allisonwiddecombe@gmail.com or¶

Robyn.Marowitz@colorado.edu¶

Thank you for your i interest!¶

Citations¶

Ardhuin, Fabrice & Herbers, T & Vledder, Gerbrant & Watts, Kristen & Jensen, R. & Graber, Hans. (2015). Slanting-Fetch-JPO-2007-Ardhuin-etal.pdf.

Becker, J.J., D.T. Sandwell, W.H.F. Smith, J. Braud, B. Binder, J. Depner, D. Fabre, J. Factor, S. Ingalls, S.-H. Kim, R. Ladner, K. Marks, S. Nelson, A. Pharaoh, R. Trimmer, J. Von Rosenberg, G. Wallace, and P. Weatherall (2009) Global Bathymetry and Elevation Data at 30 Arc Seconds Resolution: SRTM30_PLUS, Marine Geodesy, 32:4, 355-371, http://dx.doi.org/10.1080/01490410903297766.

Byrum, J., & Hendrix, N. (n.d.). Artificial Reefs. North Carolina Environmental Quality. Retrieved April 18, 2023, from https://www.deq.nc.gov/about/divisions/marine-fisheries/public-information-and-education/coastal-fishing-information/artificial-reefs

Coz, J. (n.d.). Artificial Reefs - Data Files. Retrieved April 29, 2023, from https://www.dnr.sc.gov/marine/reef/

"Learn about Ocean Dumping," epa.gov. October 27, 2022.

"Machine Learning Techniques Can Speed Up Glacier Modeling By A Thousand Times," Glacierhub Blog, Columbia Climate School. March 25, 2022.

National Oceanic and Atmospheric Administration (n.d.). Wrecks and Obstructions Database. U.S. Office of Coast Survey. Retrieved April 21, 2023, from https://nauticalcharts.noaa.gov/data/wrecks-and-obstructions.html

Night Lights data was shared with us from the Colorado School of Mines and processed by the Earth Observation Group, Payne Institue for Public Policy:

Hsu, F.C., Elvidge, C.D., Baugh, K., Zhizhin, M., Ghosh, T., Kroodsma, D., Susanto, A., Budy, W., Riyanto, M., Nurzeha, R. and Sudarja, Y., 2019. Cross-matching VIIRS boat detections with vessel monitoring system tracks in Indonesia. Remote Sensing, 11(9), p.995.

Elvidge, C.D., Ghosh, T., Baugh, K., Zhizhin, M., Hsu, F.C., Katada, N.S., Penalosa, W. and Hung, B.Q., 2018. Rating the effectiveness of fishery closures with visible infrared imaging radiometer suite boat detection data. Frontiers in Marine Science, 5, p.132.

Elvidge, C.D., Zhizhin, M., Baugh, K. and Hsu, F.C., 2015. Automatic boat identification system for VIIRS low light imaging data. Remote sensing, 7(3), pp.3020-3036.